Szymon Grabowski
نویسندگان
چکیده
Web log data store client activity on a particular server, usually in form of one-line “hits” with information like the client’s IP, date/time, requested file or query, download size in bytes etc. Web logs of popular sites may grow at the pace of hundreds of megabytes a day, or even more. It makes sense to archive old logs, to analyze them further, e.g. for detecting attacks or other server abuse patterns. In this work we present a specialized lossless Apache web log preprocessor and test it with combination of several popular general-purpose compressors. The test results show the proposed transform improves the compression efficiency of general-purpose compressors on average by 65% in case of gzip and 52% in case of bzip2.
منابع مشابه
A general compression algorithm that supports fast searching
The task of compressed pattern matching [2] is to report all the occurences of a given pattern P in a text T available in compressed form. Certain compression algorithms allow for searching without prior decoding which may be practical, especially if the search is faster than in the non-compressed representation. Most of the known schemes, however, either assume a text formed into words, or are...
متن کاملMultiple Pattern Matching Revisited
We consider the classical exact multiple string matching problem. Our solution is based on q-grams combined with pattern superimposition, bit-parallelism and alphabet size reduction. We discuss the pros and cons of the various alternatives of how to achieve best combination. Our method is closely related to previous work by (Salmela et al., 2006). The experimental results show that our method p...
متن کاملPreprocessing for Real-Time Handwritten Character Recognition
We present a real-time on-line handwritten character recognition system , based on an ensemble of neural networks. In this work we focus on the developed preprocessing algorithms which help achieve high accuracy rate without a visible delay in recognition process.
متن کاملSimple Techniques for Plagiarism Detection in Student Programming Projects
In this paper we deal with the stealing program code problem. The specific of plagiarism attempts concerning the work of a programmer is that in most programming languages it is very easy to change the “look” of a piece of code without changing its semantics at all. Basically, plagiarism detection algorithms look at either the code structure or just specific phrases. We experiment with the latt...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007